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1 . Introduction 

The large-scale structure of the Universe traced by the three-dimensional distribution of 
galaxies shows intriguing patterns: filaments and sheet-like structures connecting in huge 
clusters surround nearly empty regions, the so-called voids. Surveys of galaxies are created 
by measuring for each galaxy in addition to its angular position on the sky its distance, 
estimated from their recession velocities, within the framework of a cosmological model. 
However, the measured recession velocities are not only due to the Hubble expansion, they 
appear contaminated by the line-of-sight contribution of the dynamical velocity of a galaxy, 
due to the local gravitational effects, so the distances of galaxies are in error. Galaxy 
surveys based on recession velocities are called 'redshift space' maps, and although they are 
distorted versions of the three-dimensional distribution of galaxies, distance errors are not 
as serious as to change the overall picture of th e large-scale structure. 

An overview of such galaxy maps is given in lMartmez and Saar ( 2002 ). As an example, 



we present here a m ap from a recently completed 2dF Galaxy Redshift Survey (2dFGRS, 



Golless et al.l (|200ir) ). This survey measured the redshifts (recession velocities) of galaxies 
in about 1500 square degrees, up to the distances of about 700 ft~^Mp({3 (corresponding to 
a redshift z = 0.2 for the standard cosmological model). The redshifts were measured in 
two different regions of the sky; Fig.[T]shows the positions of galaxies in two 2.6° thick slices 
from both regions. 

^Address for correspondence: Vicent J. Martinez, Observatori Astronomic de la Universitat de 
Valencia, Edifici d'Instituts d'lnvestigacio, Polfgon La Coma s/n, 46980 Paterna, Valencia, Spain. 
E-mail: martinez@uv.es 

J Distances between galaxies are usually measured in megaparsecs (Mpc); IMpc ~ 3 • 10^'* cm. 
The constant h is the dimensionless Hubble parameter; the latest determinations give for its value 
h X 0.71. 
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Fig. 1. Galaxy map for two 2dFGRS slices. The observers (we) are situated at the centre of the 
figure. Both slices are thin, with the thickness of 2.6°. The distances are given in redshifts z\ ap- 
proximately, the physical distance D 3000 /i"^ 2:Mpc. The numbers along the arcs show the right 
ascension (in hours). The filamentary network of galaxies is clearly seen; the disappearance of 
structure with depth (towards the sides of the figure) is caused by luminosity selection. 



The most characteristic feature of ah three-dimensional galaxy cartographies are the 
remarkable network of filaments that can be appreciated when diagrams like the one shown 
in Fig.[T]are depicted. All filaments are not equal, they show different size and contrast, but 
typically relatively empty voids are found between them. Different scales are involved in the 
filamentary patterns, and it is of great importance to have algorithms to identify them and 
to characterise their properties. We should mention here that the gradual disappearance 
of structures with distance observed in Fig. [T] is a selection effect due to how this surveys 
are built. They are flux-limited samples, and since the apparent luminosity of a galaxy 
is fainter if the galaxy lies further away, only the brightest galaxies are seen in the more 
distant regions of the surveyed volume. 

Certainly filaments are prominent and visually they dominate the galaxy maps, however 
there are still no standard methods to describe the observed filamentary structure. Second- 
order summary statistics like the two-point correlation function, the if-function or the power 
spectrum (in Fourier space) do not provide morphological information. Minkowski func- 
tionals, minimal spanning tree ( MST), percolation and sha pefinders have been introduced 
for this purpose (for a review see iMartmez and Saai ( 20021 )'). 

The minimal spanning tree was introduced in cosmology bv lBarrow et all (|l985[ ). It is 
a unique graph that connects all points of the process without closed loops, but certainly 
describes mainly the local nearest-neighbour distribution, being unable to provide a whole 
characterisation of the global and large-scale properties of the filamentary network. 

A better method , named skeleton, has been recently proposed (jEriksen et al.l (|2004| ): 
Novikov et al. ( 20061 )) to describe the possible filamentary structure of continuous density 



fields. The skeleton is determined by segments parallel to the gradient of the field, con- 
necting saddle points to local maxima. Calculating the skeleton involves interpolation and 
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smoothing the point distribution, which introduces an extra parameter, the band-width of 
the kernel function used to estimate the density field from the point distribution, typically a 
Gaussian function. In any case, this method has been recently applied to the Sloan Digital 
Sky Survey (Sousbie et alj (|2006( ). providing, by means of the length of the skeleton, a good 
discriminant tool for the analysis of the filamentary structures. 

In a previous paper (IStoica et al. we proposed to use an automated method 

to trace filaments for realisations of point processes, th at has been shown to work well for 



detection of road networks in remote sensing situations IjLacoste et al.l (|2005l ): IStoica et al 



(|2002. 2004)). This method is based on the Candy model, a marked point process where 
segments serve as marks. The Candy model can be applied to 2-D filaments, and we tested 
it on simulated galaxy distributions. The filaments we found delineated well the filaments 
detected by eye. 

Many methods used to automatically detect filaments have been developed so far for 
two-dimensional ma ps. Th i s is a natural approach for studying the cosmic microwave sky 

background (Eriks cn et al.l (j200j)), which is tw o-dimensional, but galaxy maps are three- 

dimen s ional. The previous filarn e nt stu dies fe.g.. lBharadwai et al. I (|2004l) :i iBharadwai and PandevI 
(2004); Pandev and Bharadwai ( 20051 )) have also used two-dimensional galaxy maps, pro- 
jections for thin slices. The main reason for that is that most of the past large-scale galaxy 
maps were observed for relatively thin spatial slices; also, in projection filaments seem more 
prominent. But, of course, both slicing of filaments and projecting the distribution onto a 
plane distorts the geometry and the properties of the filamentary network. 

The study of the three-dimensional filamentary network has just begun. Three-dimensio- 
nal filaments h ave be en extracted from galaxy distribution as a result of special observa- 
tional projects (IPimbblet and Drinkwater (2004 )). or by searching for filaments in the 2dF- 
GRS catalogue (jPimbblet et al .1 (|2004h ;i. These filaments have been searched for between 
galaxy clusters, determining the density distribution and deciding if it is filamentary, indi- 
vidually for ev e ry fila ment. Similar studies have been carried out for N-body simulations 
( Colberg et al.l (12005)). A review of these and previous studies of large-scale filaments is 
given in ("Pimbbletl (|2005h ). 

No automated methods to trace filaments in the three-dimensional galaxy maps have 
been proposed so far. Based on our previous experience with the Candy process, we gener- 
alised the approach for three dimensions. As the interactions between the structure elements 
are mo re complex in three d imensions, we had to define a more complex model, the Bisous 
model (|Stoica et al.l (|2005al )). This model gives a general framework for the construction 
of complex patterns made of simple interacting objects. In our case, it can be seen as a 
generalisation of the Candy model. We will describe the Bisous model below and will apply 
it to the samples chosen from the real three-dimensional galaxy distribution, the 2dFGRS 
catalogue. 



2. Object point processes: definitions and manipulation tools 

2. 1. Definitions 

Let {K, B, I') be a measure space, where K is a. compact subset of M.^ with strictly positive 
Lebesgue measure < ly'iK) < oo and B the associated Borel cr— algebra of subsets of 
K. If, to points in K, shape descriptors or marks are attached, objects are formed. Let 
(M, Ai, I'm) be the probability measure space of these marks. 

The considered configuration space is = U^o'^nj '^i^h S„ the set of all unordered 
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configurations y = {(fci,mi), (^2,^2), ■ • ■ , (fc„,TO„)} tliat consist of n not necessarily dis- 
tinct objects Hi = (ki,mi) € K x M. Sq is the empty configuration. Q is equipped with 
the cr— algebra J- generated by the mappings that count the number of objects in Borel sets 
AC K X M. 

A marked point process with locations of objects in K and marks in M is a measurable 
mapping from some probability space into (0,^^). An object point process is a marked 
point process with marks representing shape descriptors of the objects. 

For simplicity, throughout this paper the shape of an object is defined by a compact set 
s{yi) = s{ki, nii) that is a subset of of finite volume v{s{yi)). The shape of the object 
configuration y is defined by the random set Z{-y) = U"i^^^s(j/i). 

The simplest object point process is the Poisson object point process. This process is 
used as a reference measure. It generates a configuration of objects as follows: first the 
number of objects is chosen according to a Poisson law of intensity v{K), then the locations 
of objects are distributed uniformly in K and finally the corresponding shapes are chosen 
independently for each object with respect to 

The Poisson object point process does not take into account interactions between objects. 
Indeed, more realistic models can be constructed by specifying a probability density with 
respect to the reference measure: 

p(y|0) =aexphC/(y|^^)] (1) 

with the normalising constant a, the model parameter vector 9 and the energy function of 
the system U{y\9). There is a lot of freedom for constructing the energy function, provided 
there exists a positive real constant A > such that 

c/(y|^)-c/(yu{(fc,m)|^)}<iogA (2) 

The condition Q is known in the literature as the local stability property and it im- 
plies the integr ability with respect to the reference measure, of the probability density given 
by ll]). Furthermore, the local stability property is of major importance in esta blishing con- 
vergence proofs for the Monte Carlo dynamics simulating such a model iGeveil fl999 ). The 
quantity exp\U{y\6) — C/(y U {(fc, m)\6)] is usually called the Papangelou or the conditional 
intensity ratio. 

For further details related to the definition and the properties of ob j ect point processes 

we rec o mmend the r eader the following monographs ( van Lieshout ( 20001 ) : [M0ller and Waagepetersen 
(l2003l) : lReisj (Il993l )1. 



2.2. Manipulation tools: sampling and inference 

Several Markov chain Monte Car l o Ma r kov techniqu es are availabl e to simulate object 
point p rocesses (ICever and MolleiJ (ll994l):[Geveil ('l999'):' Greenl (ll995l) : |Kendall and Moller 
(|2000[) : lvan LieshoutI (|2000l ): Ivan Lieshout and Stoica (.20061 ): iPrestonI (|l97l1 )). 

In this paper, we need to sample from the joint law p{y,0). This is done using an 
iterative Monte Carlo algorithm. An iteration of the algorithm consists of two steps. First, 
a parameter value is chosen with respect to p{9). Then, conditi onally on 6, the pattern 



npi ea trom iP(y | a] using a ly ietr 
1: IStoica et all (|2004 [2005allbl )). 



is samp l ed from p(y_ \ 9) using a M etropolis-Hastings algorithm (jvan Lieshout and Stoica 
(l2003c " 



For the problem on hand, d, the data to be analysed, consist of points (galaxies) spread 
in a finite volume K . We want to detect the filamentary pattern "hidden" in these data. 
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Two hypotheses are assumed. First, this rather complex pattern is supposed to be formed by 
simple interacting objects. Second, the filamentary pattern is considered to be a realisation 
of an object point process. The energy function of such a process can be written as follows: 

U{y\0) = Ud{y\e) + U^{y\0) (3) 

where Ud{y\d) is the data energy and Ui{y\d) the interaction energy. 

The estimator of the filamentary structure in a field of galaxies together with the param- 
eter estimates are given by the configuration of objects minimising the total energy function 
of the system 

{y,9) = argmaxp(y,6') = argmaxp(y|6')p(6') 

= argmm{Ud{y\e) + U^{y\d)~\ogp{e)} (4) 

with the model parameters space. 

The minimisation of the energ y func ti on can be performed by means of a simulated 



ine mmimisation or tne energ y tunc ti on can be periormed Dy means ol a simulated 
annealing algorithm (jvan LieshoutI (|l994) : IStoica et all (|2005al) ). This method iteratively 



samples from p{y,9y^^ while slowly decreasing the temperature parameter T. When the 
system is frozen i.e. T —> 0, the simulated annealing samples uniformly on the sub-space of 
configurations minimising the energy function ([3]). 

In this case, the solution obtained is not unique. Hence, it is legitimate to ask if an 
elemen t of the pattern real ly belongs to the pattern, or if its presence is due to random 
effects (jStoica et al.l (|2005at )). 



For compact regions iS C R-^ of finite volume < v{S) < oo, we write the probability 
that an object from the pattern y covers S, as follows: 

n(Y) ^ \ r n(Y) ^ 



The cover probability given by ([5]) can be approximated by its Monte Carlo counterpart: 

' ( "(Y) ] \ U ( n(Y„) ^ 

l\ J2 1{5 C .(r,)} > U =77EM E i{5 C ,s(K„) > 0} [ (6) 




u—1 I i—1 

where Yi, Y^/ are obtained sampling from p(y, 9). For the pattern Y,(, the set with the 
corresponding objects is {Yui, i = 1, . . . , n{Y{u))}. 

The formula ((5|) can be seen as a tool for studying the average behaviour in terms of 
location and shape of the unknown pattern exhibited by the data. 

There are different choices for the study of the average behaviour of the shape exhibited 
by the data. For instance, we may compute the probability that a given compact region 
would be covered by the union of the objects, or the probability that an object intersects 
the given region. We have discarded these alternatives because they would lead to an 
over-detection of the filamentary pattern, similar to morphological dilation, and hence to 
filaments which are too thick. 
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3. Bisous model applied to filamentary pattern detection 



Detection of cosmic filaments using marked point processes was first performed on two- 
dimensional simulated data by IStoica et alj ( 2005b( ). The marked point process that was 
used throughout the mentioned work was the Candy model. This marked point process is 
able to simulate and detect two-dimensional linear networks. Clearly, for real data analysis, 
a three-dimensional model is needed. 

In this paper, we use the Bisous model, which is an o bject point process b uilt to model 
and analyse general three dimensional spatial patterns ( Stoica et al. ( 2005al )). The spa- 
tial pattern is made of simple interacting objects or generating elements. The particular 
property of these objects is that they exhibit a finite number q of rigid extremity points 
{ei, . . . jCg}. Around each object y an attraction region a(y) is defined. This attraction 
region is the disjoint union a[y) = U^^]^6(e„, Tq), where h{eu, ra) is the ball of fixed radius 
Va centred in Cu- Objects attract each other through their attraction region. If they attract 
each other, then they may get connected. Connectivity is a more complex interaction than 
attraction. We will show later in this paper that for instance, connectivity takes also into 
account the orientation between the main symmetry axes of the objects. The connected 
objects form the pattern. The object connected to the pattern by means of s extremity 
points is called s-connected. Among the interactions exhibited by the objects forming the 
pattern we enumerate connectivity, alignment and repulsion. They are to be defined later 
in the next section of the paper. 

For a pattern y the probability density of the Bisous model can be written as follows: 



p(y|| 



"(y) 

n 



i=l 



n 

.s=0 



,"s(y) 



n 



(7) 



where f3 : K x M ^ M+ is a positive function and 7s > 0, 7^ G [0, 1] are the interaction 
model parameters. The function P{y) is defined below by logP{y) — u{y), where u{y) is 
given by IjlOp. F is a set of interactions between objects that are pairwise, local (distance 
based), symmetric and guarantee that the density ([7]) is well defined. For each s and k 
the sufficient statistics ns{y) and ni^{y) represent, respectively, the number of s-connected 
objects and the number of pairs of objects exhibiting the interaction of type k in the 
configuration y. 

The total energy of the model U{y\9) is computed taking the negative logarithm function 
of ([7]). We naturally define its two components, the data energy 



"(y) 

Udiy\0) = -J2\ogP{y, 



(8) 



and the interaction energy 

Utiy\0) = -^ns(y) log 7s - ^^^(y) log 7k. 



(9) 



In the following, we will present the generating element together with its corresponding 
interactions that define ([9]), and will define the data energy (jS]), which is related to the 
location of the filamentary pattern among the galaxies. 
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3. 1. Generating element and its interactions 

The generating element of the filamentary pattern "hidden" in the galaxy data is a random 
thin cylinder. Such a cylinder has a random centre in K and has a fixed radius r and a 
height h, with r <^h. Its random mark is given by w, the orientation vector of the cylinder. 
Its corresponding parameters w = (/)(77, r) are uniformly distributed on M = [0, 2tt) x [0, 1] 
such that 

UJ = [sj 1 — cos(?7), \/l — sin(77), r). 

A cylinder [k^oj) has q = 2 extremity points. In Figure [5] the cylinder is centred in the 
origin and its symmetry axis is parallel to Oz. The coordinates of the extremity points are 



e„ = (0,0,(-l)"+i(-+r„)), 



e{i,2} 



and the orientation vector is cj = (0,0,1). A randomly located and oriented cy linder is 
obtained by the means of a translation and two rotations ( Hearn and Baker ( 1994 )1. 



A' 




attraction region 



Fig. 2. Generating element for the filamentary pattern. 



Two cylinders yi = (fci, wi) and j/2 = {k2,^2) attract each other, and we write yi ~a 2/2 
if the set 

: 1 < M, u < 2,u ^ v,d(eu[yi),ey{y2)) < ra} 

contains only one element. 

The same cylinders reject each other, and we write yi y2 if '^(^i, fe) < h. They are 
aligned, and we write yi ^|| ?/2 if 

UJi ■ UJ2 > I — T|| 

where T|| G (0, 1) is a predefined curvature parameter and • designs the scalar product of 
the two orientation vectors. If 

\U)1 ■ U!2\ <T± 
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where G (0, 1) is a predefined crossing curvature, they are said to be orthogonal, and we 
write yi 2/2- 

Two cyhnders yi and ?/2 are connected and we write yi ~c 2/2 if the following conditions 
are simultaneously fulfilled: 

2/1 ~a 2/2 
2/1 /r 2/2 

2/1 ~|| 2/2 

If the following conditions arc simultaneously verified: 



2/1 ~r 2/2 

2/1 '/x 2/2 



the cylinders exhibit a hard repulsion, and we write 2/1 2/2- 

The filaments forming the pattern are made of cylinders that attract each other and 
they are well aligned. These filaments may cross. At the same time configurations with 
overlapping cylinders are not desirable since they may outline clusters of galaxies instead 
of filaments. 

In Figure [3] a cylinder configuration is represented in two dimensions. According to the 
previous definitions, we observe that ci ~c C2 since ci C2, ci C2, and Ci ~|| C2. Thus, 
the cylinders ci and C2 are s-connected. In order to form filaments, this kind of interaction 
is encouraged by the model. The cylinders C2 and C3 reject each other, but they are rather 
orthogonal, so their position is not penalised by the model. The cylinders C2 and C4 are 
not s-connected since C2 7^0 C4, and their position is not encouraged nor penalised + by the 
model. The cylinders C4 and C5 reject each other, C4 ~r C5, and simultaneously they tend 
to overlap, C4 •/x C5. So, this pair of cylinders exhibits a hard repulsion interaction and 
this configuration is strongly penalised by the model. 




Fig. 3. Two-dimensional representation of a configuration of interacting cylinders. 



With respect to these considerations, all the ingredients needed for the natural construc- 
tion of the interaction energy ^ are defined. The s-connectivity is constructed using the 
relation ~c- The T set contains here only one element given by '^h- The model parameters 
for the interaction energy are 70, 71, 72 > and 7/1 G [0, 1]. The definition of the interactions 
and the parameter ranges ensure the object point process induced by the inter action energy 



to be w ell defined, locally stable and Markov in the sense of Ripley-Kelly (jStoica et al 
(2QQ53))- 
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3.2. The data energy 

The data energy is related to the position of the filamentary pattern in the galaxy field. To 
each cylinder y an extra cylinder is attached, so that it has exactly the same parameters as 
y, except for the radius which equals 2r. Let s(y) be the shadow of s{y) obtained by the 
substraction of the initial cylinder from the extra cylinder, as indicated in Figure |4l The 
cylinder y is divided along its main symmetry axis, in three equal volumes, and we denote 
by Si(i/), S2{y) and s^{y) their corresponding geometrical shapes. 




Fig. 4. A two-dimensional view of a cylinder with its shadow in a pattern of galaxies. 



A cylinder may belong to the pattern if several conditions are verified. First, the density 
of galaxies inside s{y) has to be higher than the density of galaxies in s{y), and it can be 
written as follows: 

l{"density"} = l{n(d n s{y))i^{:s{y)) > n{d H S{y))iy{s{y))} 

where n(d H s{y)) and n(d H s{y)) represent the number of galaxies covered by the cylinder 
and its shadow, and i'{s{y)) and i/(s(y)) are the volumes of the cylinder and its shadow, 
respectively. Next, the galaxies covered have to be spread inside the whole volume. This is 
formulated below: 

3 

l{"spread"} ^l[l{n{dn s,{y)) > 1} 

with n(d n Si(y)) the number of galaxies belonging to Si{y). Under these assumptions, u(jj) 
the energy contribution of a cylinder is defined as follows: 

u{y) = 1{ "density" }1{ "spread" }(n(d n s{y)) - n{d n s{y) + Umax) - "max (10) 

where Wmax is a positive fixed quantity. The introduction of this term gives to a segment 
that does not meet all the previous conditions, a very small chance to be a part of the 
network. This should give more complete networks and better mixing properties to the 
method. 
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The data energy ([5]) is obtained by summing the energy contributions ^TU\i for aU the 
cyhnders in the pattern: 

"(y) 

C/d(y|e) = -E"(yO- (11) 
1=1 

It can be checked that the data energy term pip induces a locaUy stable object point 
process. 



4. Data 

4. 1 . Observational data 

The best available redshift catalogue to study morph ology of the galaxy distribution at 



galaxy 

present is the 2dF Galaxy Redshift Survey (2dFGRS) (jColless et all 120011 ). The catalogue 
covers two separate regions in the sky, the NOP (North Galactic Cap) strip, and the SGP 
(South Galactic Cap) strip, with total area of about 1500 square degrees. The nominal 
(extinction-corrected) magnitude limit of the catalogue is bj = 19.45; reliable redshifts were 
obtained for 221414 galaxies. As seen in Fig.[l] the effective depth of the catalogue is about 
z = 0.2 or D « 600 h'^ Mpc, and the total volume F w 3.3 • lO^/i"^ Mpc^. 

As all galaxy redshift catalogues, the catalogue is not strictly homogeneous. First, due 
to different observing conditions, the magnitude limit (the depth of the catalogue) changes 
from observation to observation, but the changes are known and well documented. Second, 
due to the fact that the fibbers that lead the light from the focal plane to the spectrograph 
have a finite size, the redshifts of a few per cent of selected galaxies could not be measured. 
This effect is quantified as redshift completeness. All the data and the programs to calculate 
the magnitude limits and the spectroscopic completeness are public and can be found at 
http : //www .mso . anu . edu . au/2dFGRS/. 

The 2dFGRS catalogue is flux-limited catalogue and therefore the density of galaxies 
decreases with distance. For statistical analysis of such surveys, a weighting scheme that 
compensates for the missing galaxies at large distances, has to be used. Usually , each 



galaxy is weighted by the inverse of the selection function ([Martinez and Saan . l2002t ). The 
selection function is determined by the two main selection effects described above. However, 
such a weighting is suitable only for specific statistical problems, as, e.g., the calculation of 
correlation functions. When studying the local structure, such a weighting cannot be used; 
it would only amplify the shot noise. 

At the cost of discarding many surveyed galaxies, one can alternatively use volume- 
limited samples. In this case, the variation in density at different locations depends only 
on the fluctuations of the galaxy distribution itself. We started our analysis by using th e 
volume- limited samples prepared by the 2dF team for scahng studies ( Croton et al. ( 2004f )) 



and kindly sent to us by Darren Croton. Unfortunately, we found soon that the galaxy 
density in these samples was too low and the shot noise did not allow algorithms to flnd 
fllaments; there are, typically, only of the order of ten galaxies per filament is such samples. 

Thus we choose another way - we started from a full galaxy sample, and choose from that 
bricks in a limited distance range, where the galaxy density was approximately constant. 
We use bricks in this paper to avoid problems with complex boundaries of the full 2dfGRS 
sample. As the SGP half of the galaxy sample has a convex geometry (it is limited by two 
conical sections of different opening angles), the constant density bricks that is possible to 
cut from the SGP have small volumes. Thus we used only the NGP data, and chose three 
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Table 1. Galaxy content and geometry for the 2dFGRS bricks (in 
/i^^Mpc). TVggi is the number of galaxies in a sample, Dmm is the 
distance of a sample from the observer (its nearest point), Dmax is the 
maximum distance in a sample from the observer (to the far-away brick 
corners) and d is the mean distance between galaxies in a sample. 



sample 




-^miii 


depth 


width 


height 


-^max 


d 


NGP150 


2499 


87.9 


53.1 


101.5 


12.4 


150.0 


2.99 


NGP200 


4180 


117.2 


70.9 


135.3 


16.5 


200.0 
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NGP250 


7588 
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88.6 
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20.7 


250.0 


3.44 



0.12 
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Fig. 5. Galaxy density for the full 2dFGRS NGP sample (in comoving coordinates) versus the dis- 
tance D from the observer. As the nearby volume is small, single superclusters cause strong spikes 
in the histogram. The most prominent supercluster that creates the spike at D ^ 250 h^^Mpc, lies 
just outside of the largest data brick used in this work. 

bricks of maximum length and height, with different depths. General characteristics of these 
data sets are listed in Table [1] As seen from the (comoving) galaxy density histogram in 
Fig- El the densities inside the bricks are approximately constant. 

We show one of these three bricks inside the 2dFGRS NGP sample limits and the spatial 
distribution of galaxies in Fig. [6l 

4.2. Experimental results 

As described above, we use three data sets, drawn from the galaxy distribution in the 
Northern subsample of the 2dFGRS survey. The size of the brick determines the size of 
K. As the data resolution is similar (column d in Table [1] we choose the same values for 
the dimensions of the cylinder for all the data sets, as follows: r = 0.5, h = 6.0. The 
radius of the cylinder is close to the minimal one can choose, taking into account the data 
resolution. Its height is also close to the shortest possible, as our shadow cylinder has to 
have a cylindrical geometry, too (the ratio of its height to the diameter is presently 3:1). 
We choose the attraction radius as — 0.5, giving for the maximum distance between 
connected cylinders the value 1.5, and the pre-defined curvatures are r|| = t± = 0.15. This 
allows for a maximum of « 30° between the direction angles of connected cylinders, and 
limits the orthogonality condition to angles larger than « 80°. 




Fig. 6. One of the cuboidal samples analysed in this paper drawn from the Northern Galactic Cap of 
the 2dFGRS. The wedge shows the approximate extent of the full sample, its real borders are more 
complex. 



The data energy parameter is Wmax — —25. For the interaction energy, the parameter 
domain is chosen as foUows: log 70 G [—12.5,-7.5], log 71 G [—5,0] and log 72 £ [0,5]. 
The hard repulsion parameter is 7/1 — 0, so the configurations with cylinders exhibiting 
such interactions are forbidden. The domain of the connection parameters was chosen such 
that 2-connected cylinders are generally encouraged, 1-connected cylinders are penalised 
and 0-connected segments are strongly penalised. This choice encourages the cylinders to 
group in filaments in those regions where the data energy is good enough. Still, we have no 
information about the relative strength of those parameters. Therefore, we have decided to 
use for the prior parameter density p{0) the uniform law over the parameter domain. 

A delicate point when building a simulated an nealing proc e dure is the choice of an 
appropriate cooling schedule for the temperature. (jStoica et al. I (l2005al) 'l proved that the 



use of a logarithmic cooling schedule guarantees the convergence of the simulated annealing 
algorithm when the parameters of the point process are fixed. Therefore, we have chosen 
the following scheme for the descent of the temperature: 

rp To 

-L ri 



1 + log n 



with To = 1. 

We ran the simulated annealing algorithm for 250000 iterations. Samples were picked 
up every 250 iterations. The obtained cylinder configurations for the data sets A^G'P150, 
NGP200 and NGP250 are shown in Figure [7)3, Figure [8)3 and Figure [9)d, respectively. 

The cylinders obtained after running the simulated annealing are situated in the regions 
where we see that the data exhibit filamentary structures. Still, as simulated annealing 
requires infinitely many iterations till convergence, and also because of the fact that an 
infinity of solutions are proposed, we shall use coverage probabilities to "average" the shape 
of the filaments. 

For each data set the corresponding domain K was divided into small cubic cells of size 
1 h^^Mpc. These cells are small enough to be covered by a single cylinder. Computing the 
coverage probabilities directly using simulating annealing outputs is not possible since ([6]) 
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Table 2. The mean of the sufficient statistics for the three 
data sets: m is the mean number of the 2-connected 
cylinders, m is the mean number of the 1-connected cylin- 
ders and no is the mean number of the 0-connected cylin- 
ders. 



Sufficient statistics 


Data sets 


NGP150 


NGP200 


NGP250 


n2 


4.13 


5.83 


9.88 


no 


15.88 


21.19 


35.82 


m 


21.35 


35.58 


46.49 



is an expectation under p{y, 9). Hence, the algorithm was run until Tq, a fixed temperature 
close to zero, was reached. The coverage probabilities were computed as an expectation 
similar to (O but under the law given by p(y, 9^/^° . The obtained result was thresholded 
using three distinct values: 0.5, 0.75 and 0.95. For each of these values we obtain a map 
of the cells that have been visited by our model with a frequency higher or equal than the 
given value. These visit maps for to the data sets 7VG'P150, NGP20Q and 7VGP250 are 
shown in Figure [7t-d, Figure [8t-d and Figure [It-d respectively. 

The filamentary network is clearly outlined. Here we consider a filament the geometric 
structure obtained recursively from a visited cell and its corresponding neighbouring cells. 
In this case, we have defined the neighbours of a cell, the cells that have coordinates in- 
cremented or decremented by one with respect to the reference cell. Hence, a cell has 26 
neighbouring cells. 

Although many filaments are of simple form, there are many filaments which exhibit 
complex morphology, from simple branching to multibranch complexes. This is a new fact 
that will probably force cosmologists to reconsider their usual notion of filaments as simple 
bridges between clusters of galaxies. We show two examples of such filaments below, one of 
a comparatively simple shape fFig. [TU|) . and another of very complex shape (Fig. [TT|). 



4.3. Statistics 

The coverage probabilities are not a global test for our method, because these probabilities 
are computed only for small regions. The computation of the probability that a whole 
filament is covered by our model requires the knowledge a priori of the shape of the filament. 
( Stoica et al.l (|2005al )) used for a different problem - cluster detection in epidemiological 
data - a test, to check that the pattern is detected rather because of the data than of the 
model parameters. 

Following this idea, the following experiments were carried out for each data set. First, 
the method was launched during 50000 iterations at fixed T = 1.0. Samples were picked 
up every 250 iterations. The means of the sufficient statistics of the model were computed 
using these samples. The obtained results are shown in Tabled 

The second experiment consisted of re-distributing uniformly the points inside the do- 
main K. So, the points follow a binomial distribution. For each data set this operation 
was done 100 times, hence obtaining 100 point fields. For each point field the method was 
launched in the conditions previously described. The mean of the sufficient statistics was 
then computed. The maximum values over all the 100 means for each data set are shown 
in Table [1 




Fig. 7. Data set iVGP150. a) The data, b) Cylinder configuration obtained using the simulated 
annealing algorithm. Cover probabilities thresholded at c) 50%, d) 95%. 




Fig. 8. Data set 7VGP200. a) The data, b) Cylinder configuration obtained using tlie simulated 
annealing algorithm. Cover probabilities thresholded at c) 50%, d) 95%. 
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Fig. 9. Data set NGP250. a) The data, b) Cylinder configuration obtained using tlie simulated 
annealing algorithm. Cover probabilities thresholded at c) 50%, d) 95%. 
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Fig. 10. A simple filament from the 7VGP250 sample. Lighter shading (green colour) shows the 50% 
cover probability threshold, darker shading (red colour) - the 95% threshold. 




Fig. 11. A complex filament from the NGP150 sample. Lighter shading (green colour) shows the 
50% cover probability threshold, darker shading (red colour) - the 95% threshold. 



Table 3. The maximum of mean of the sufficient statis- 
tics over binomial fields generated for each of the three 
data sets: max n2 is the maximum mean number of the 2- 
connected cylinders, maxni is the maximum mean num- 
ber of the 1-connected cylinders and max no is the maxi- 
mum mean number of the 0-connected cylinders. 



Sufficient statistics 


Binomial data sets 


NGP150 


NGP200 


NGP250 


max 712 


0.015 


0.05 


0.015 


max 710 


0.54 


0.27 


0.45 


max7ii 


0.39 


0.24 


0.33 
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These results clearly indicate that the original data exhibit a filamentary structure. No 
filamentary structure is detected when the same model runs over binomial fields of points 
that has the same number of points as the original data. This test discriminates the two 
cases at a p- value less than 1%, and assures us that the filaments we find are due to data 
and not to our model. 



5. Conclusion and perspectives 

We have applied an object point process - Bisons model - to objectively find filaments in 
galaxy redshift surveys (three-dimensional galaxy maps). For that, we defined the model, 
fixed some of the interaction parameters and chose priors for the remaining parameters. The 
definition of the data term is very intuitive and rather a simple test. Much more elaborated 
methods testing the alignment of the points along the direction of the cylinder against 
the completely spatial randomness need investigation. The uniform law for the interaction 
parameters was preferred in order to give the same chances to a wide range of topologies 
of the filamentary network. Still, if concrete prior information about the topology of the 
filamentary network is available then this should be integrated in the model. 

We have run simulated annealing sequences and select filaments on the basis of coverage 
probabilities for individual cells of sample volume. The coverage probabilities are to be seen 
as a way of averaging the shape of the filamentary structure. They have the advantage to 
allow inference from statistics instead of a single realisation. Their main drawback is that 
the coverage probabilities are computed locally, for small regions. Still, the visualisation 
of the visit maps built on these probabilities brings new ideas and hypotheses about the 
topologies of the cosmic filaments. A global Monte Carlo statistical test was built to test 
the existence of the filaments in the data. To do this, we have calculated sufficient statistics 
for the data sets and made a comparison with the sufficient statistics obtained on binomial 
point fields having the same number of points as the data. Our test was indicating that the 
filaments we find are defined by the data, not by the chosen model. 

The method used in this paper can be extended in different ways. One natural exten- 
sion is to use different genera ting elements instead on cylinders, e.g., planar elements or 
clusters. ( Stoica et al. ( 2005a[) V Although traces of planar structures are seen in superclus- 



ters of galaxies, these have been difficult to quantify, mainly because of their low density 
contrast. Another interesting application is to search for dynamically bound groups and 
clusters of galaxies that have a typical 'finger-of-God' signature in redshift space, extended 
along the line-of-sight. And there remains a question if the method could be extended to 
inhomogeneous point processes - this would allow us to use all the observational data, not 
only volume- limited subsamples. 
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